Influence of Part-of-Speech and Phrasal Category Universal Tag-set in Tree-to-Tree Translation Models

نویسندگان

  • Francisco Oliveira
  • Derek F. Wong
  • Lidia S. Chao
  • Liang Tian
  • Liangye He
چکیده

Tree-to-tree Statistical Machine Translation models require the use of syntactic tree structures of both the source and target side in learning rules to guide the translation process. In order to accomplish the task, available treebanks for different languages are used as the main resources to collect necessary information to handle the translation task. However, since each treebank has its own defined tags, a barrier is inherently created in highlighting alignment relationships at different syntactic levels for different tag-sets. Moreover, these models are typically over constrained. This paper presents a unified tagset for all languages at Part-of-Speech and Phrasal Category level in tree-to-tree models. Different experiments are conducted to study for its feasibility, efficiency, and translation quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Real-time quality monitoring in debutanizer column with regression tree and ANFIS

A debutanizer column is an integral part of any petroleum refinery. Online composition monitoring of debutanizer column outlet streams is highly desirable in order to maximize the production of liquefied petroleum gas. In this article, data-driven models for debutanizer column are developed for real-time composition monitoring. The dataset used has seven process variables as inputs and the outp...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Prediction of melting points of a diverse chemical set using fuzzy regression tree

The classification and regression trees (CART) possess the advantage of being able to handlelarge data sets and yield readily interpretable models. In spite to these advantages, they are alsorecognized as highly unstable classifiers with respect to minor perturbations in the training data.In the other words methods present high variance. Fuzzy logic brings in an improvement in theseaspects due ...

متن کامل

Comparison of gestational diabetes prediction with artificial neural network and decision tree models

Background: Gestational diabetes mellitus (GDM) is one of the most common metabolic disorders in pregnancy, which is associated with serious complications. In the event of early diagnosis of this disease, some of the maternal and fetal complications can be prevented. The aim of this study was to early predict gestational diabetes mellitus by two statistical models including artificial neural ne...

متن کامل

Overcoming the uncertainty in a research reactor LOCA in level-1 PSA; Fuzzy based fault-tree/event-tree analysis

Probabilistic safety assessment (PSA) which plays a crucial role in risk evaluation is a quantitative approach intended to demonstrate how a nuclear reactor meets the safety margins as part of the licensing process. Despite PSA merits, some shortcomings associated with the final results exist. Conventional PSA uses crisp values to represent the failure probabilities of basic events. This causes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013